In this project, I created a traffic light classier that can identify the state of a traffic light using Tensorflow Object Detection API. This API provides a few pre-trained models which are capable of localizing a traffic light in an image, but not able to classify it’s state (green, yellow, red, etc). My classifier was able to both detect a traffic light and classify the color of it in an image. Afterwards, I built a pipeline to detect and classify traffic lights on videos and this pipeline achieved 100% accuary of detecting and classifying traffic lights.
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from matplotlib import pyplot as plt
from PIL import Image, ImageDraw, ImageColor
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
We used Udacity Camera Feed data (3.49 GB compressed files), which contains two .bag files - one is just_traffic_light.bag and the other one is loop_with_traffic_light.bag - with an image feed from the Udacity self-driving car's camera in the test lot and a topic containing the car's position. It can be downloaded at google drive.
Using the command line below to extract color images from bag files and save all files in a folder, the python script can be downloaded here:
python bag_to_images.py --bag_file just_traffic_light.bag --output_dir test_images/ --file_prefix just_traffic_light_
python bag_to_images.py --bag_file loop_with_traffic_light.bag --output_dir test_images/ --file_prefix loop_with_traffic_light_
src_dir = 'tensorflow/research/object_detection/'
# Size, in inches, of the output images.
IMAGE_SIZE = (20, 15)
udacity_train = src_dir + 'test_images/just_traffic_light_0340.jpeg'
image1 = Image.open(udacity_train)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image1)
Note that some images do not contain a traffic light. See an example below.
udacity_train = src_dir + 'test_images/loop_with_traffic_light_0000.jpeg'
image2 = Image.open(udacity_train)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image2)
There are a few models can be downloaded from the TensorFlow detection model zoo. In this project, we used a Faster RCNN model, which can be downloaded from here (746.4 MB). This model is trained using COCO dataset and is able to detect 90 classes of objects. Label maps map indices to category names, so that when our convolution network predicts 10, we know that this corresponds to traffic lights. Label maps can be downloaded here. Files in the faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 folder are shown below. Note that we put the label map in the sample folder. Also, We downloaded all files in folder research. Two files - label_map_util.py, visualization_utils.py - will be used in this section and more files will be used in the next section.
import visualization_utils as vis_util
import label_map_util
FASTER_RCNN = src_dir + 'models/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/'
FASTER_RCNN_GRAPH_FILE = FASTER_RCNN + 'frozen_inference_graph.pb'
print("Files extracted from compressed .tar.gz file:\n")
for file in os.listdir(FASTER_RCNN):
print(file)
Now we want to load the graph and do traffic light detection on our data.
# we put it in the same directory as the model directory
PATH_TO_LABELS = FASTER_RCNN + 'mscoco_label_map.pbtxt'
GRAPH_FILE = FASTER_RCNN_GRAPH_FILE
NUM_CLASSES = 90
class Classifier(object):
def __init__(self):
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map,
max_num_classes=NUM_CLASSES,
use_display_name=True)
self.category_index = label_map_util.create_category_index(categories)
"""Loads a frozen inference graph"""
self.detection_graph = tf.Graph()
with self.detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(GRAPH_FILE, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
self.image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
self.d_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
self.d_scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
self.d_classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
self.num_d = self.detection_graph.get_tensor_by_name('num_detections:0')
self.sess = tf.Session(graph=self.detection_graph)
def load_image_into_numpy_array(self, image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
def get_classification(self, img):
image = self.load_image_into_numpy_array(img)
# Bounding Box Detection.
with self.detection_graph.as_default():
# Expand dimension since the model expects image to have shape [1, None, None, 3].
img_expanded = np.expand_dims(img, axis=0)
(boxes, scores, classes, num) = self.sess.run(
[self.d_boxes, self.d_scores, self.d_classes, self.num_d],
feed_dict={self.image_tensor: img_expanded})
boxes = np.squeeze(boxes)
scores = np.squeeze(scores)
classes = np.squeeze(classes).astype(np.int32)
vis_util.visualize_boxes_and_labels_on_image_array(
image, boxes, classes, scores,
self.category_index,
use_normalized_coordinates=True,
line_thickness=8)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image)
return boxes, scores, classes
class1 = Classifier()
boxes, scores, classes = class1.get_classification(image1)
From the image shown above, it is clear that the pretrained model can detect traffic lights well. And we want to further tune this model so that it would be able to detect and classify traffic lights. Now our question is, how could we fine tune the pre-trained model that was designed to work on the 90 classes of the COCO dataset to work on the 4 classes of our traffic light dataset? The next part of this project will focus on solving this problem.
In general, there are a few steps that we followed to classify the color of a traffic light in an image using Tensorflow Object Detection API.
Download everything in git repo and follow the installation instruction here to install necessary packages.
protoc object_detection/protos/*.proto --python_out=.pwd:pwd/slimpython object_detection/builders/model_builder_test.pySelect a pre-trained model.
Split this data into train/val/test samples
Generate TF Records from these splits
Setup a .config file for the model of choice
Train
Export graph from new trained model
Detect custom objects in a video!
We have encoutered an issues during the installation and using the API and the solutions are shown below:
Use protoc-3.2.0 rather than 2.6.1 , use the command line below to uninstall protoc-2.6.1 and install protoc-3.2.0
sudo apt-get remove protobuf-compiler
curl -OL https://github.com/google/protobuf/releases/download/v3.2.0/protoc-3.2.0-linux-x86_64.zip
unzip protoc-3.2.0-linux-x86_64.zip -d protoc3
sudo mv protoc3/bin/* /usr/local/bin/
sudo mv protoc3/include/* /usr/local/include/
The detailed steps of using the API is shown below.
In this project, we kept the same Faster RCNN model which is used in the first part.
The raw data consists of two .bag files, so we used one for training and the other for validation. We followed the API instruction on how to create .record files from image dataset and the data requirement is showned below:
For every example in the dataset, we should have the following information:
# csv file
data = pd.read_csv(src_dir + "data/just_traffic_light.csv")
data.head()
The script we use to generate .record files are shown below (in folder object_detection):
python generate_data_sets.py
python generate_data_sets.py --input_file data/just_traffic_light.csv --output_file data/tl_val.record
We changed the model configuaration pipeline.config file from 100 to 4 max predictions and the number of labels from 90 to 4 to reduce the prediction time. This modified version is called tl.config, which can be downloaded here.
num_classes: 4
first_stage_max_proposals: 4
second_stage_post_processing
max_detections_per_class: 3
max_total_detections: 4
second_stage_batch_size: 4 (below second_stage_classification_loss_weight: 1.0)
num_steps: 1000 -- This number really depends on the size of the dataset along with a number of other factors (including how long we are willing to let the model train for). Once we start training we see how long it’s taking for each training step and adjust num_steps accordingly.
num_examples: number of evaluation samples, in this case it is 710
Note that all PATH_TO_BE_CONFIGURED needs to be changed to the real path
remove the code below in the config file
after line initial_learning_rate: 0.000300000014249
schedule {
step: 0
learning_rate: 0.000300000014249
}
To solve Python3 incompatibility issue , we need to modify models/research/object_detection/utils/learning_schedules.py lines 167-169. Currently it is
`rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
range(num_boundaries),
[0] * num_boundaries))`
Wrap list() around the range() like this:
`rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),
list(range(num_boundaries)),
[0] * num_boundaries))`
The step-by-step instruction is shown below:
item { id: 1 name: "Red" } item { id: 2 name: "Yellow" } item { id: 3 name: "Green" }
copy data/, directories to object-detection/ directory. Create a new folder tl_classification/ and copy tl.config to this directory. Our folder structure is as follows:
Note that both train.py and export_inference_graph.py are from object_detection folder.
python train.py --logtostderr --train_dir=tl_classification --pipeline_config_path=tl_classification/tl.config
To visualize the training process: tensorboard --logdir=tl_classification
If not start --> add the two lines below between gradient_clipping_by_norm and fine_tune_checkpoint will work
The number 2 above should only be starting values to get training to begin. The default for those values are 8 and 10 respectively and increasing those values should help speed up training.

After the training is complete, freeze the best model using the highest checkpoint number (assuming 10000 for this example).
inside folder object_detection ls tl_classification/model.ckpt*
python export_inference_graph.py --input_type image_tensor --pipeline_config_path tl_classification/tl.config --trained_checkpoint_prefix tl_classification/model.ckpt-1000 --output_directory tl_classification_final
must install cocoAPI in order to use
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools <path_to_tensorflow>/models/research/
inside folder object_detection
python eval.py --logtostderr --pipeline_config_path=tl_classification/tl.config --checkpoint_dir=./tl_classification --eval_dir=./tl_classification_eval
To visualize the eval results
tensorboard --logdir=./tl_classification_eval
Let's look at the performance of our classifier on a single image.
PATH_TO_LABELS = src_dir + 'data/tl_label_map.pbtxt'
GRAPH_FILE = src_dir + "tl_classification_final/frozen_inference_graph_old.pb"
NUM_CLASSES = 4
class2 = Classifier()
boxes, scores, classes = class2.get_classification(image1)
This trained model is able to locate the traffic light in a image and classify its color.
# Import everything needed to edit/save/watch video clips
from moviepy.editor import VideoFileClip, ImageSequenceClip
from IPython.display import HTML
import glob
We first conbine all images into a video, and we have two videos: just_traffic_light.mp4 and loop_with_traffic_light.mp4 after running the code below.
def image_to_video(images_folder, prefix):
# list of image paths
myfiles = glob.glob(images_folder + prefix + "*.jpeg")
myfiles.sort()
# combine images to a video
foutput = images_folder + prefix + '.mp4'
clip = ImageSequenceClip(myfiles, load_images=True, fps=25)
# output a video
clip.write_videofile(foutput)
image_to_video(src_dir + "test_images/", "just")
image_to_video(src_dir + "test_images/", "loop")
We created the pipeline below to detect and classify traffic light in videos, the results are promising.
finput1 = src_dir + "test_images/just.mp4"
foutput1 = src_dir + "test_images/just_detection.mp4"
finput2 = src_dir + "test_images/loop.mp4"
foutput2 = src_dir + "test_images/loop_detection.mp4"
def pipeline(img):
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
draw_img = img
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
# Actual detection.
(boxes, scores, classes, num) = sess.run([detection_boxes,
detection_scores,
detection_classes,
num_detections],
feed_dict={image_tensor: np.expand_dims(img, 0)})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(draw_img,np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
return draw_img
def load_graph(graph_file):
"""Loads a frozen inference graph"""
graph = tf.Graph()
with graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(graph_file, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
return graph
detection_graph = load_graph(GRAPH_FILE)
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = daetection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
clip = VideoFileClip(finput1)
print("video size: ", clip.size)
new_clip = clip.fl_image(pipeline)
%time new_clip.write_videofile(foutput1, audio=False)
HTML("""
<video width="1000" height="800" controls>
<source src="{0}" type="video/mp4">
</video>
""".format('https://github.com/wzding/Electric_Eel_Capstone/blob/master/video/just_traffic_light_detection.mp4?raw=true'))
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
clip = VideoFileClip(finput2)
print("video size: ", clip.size)
new_clip = clip.fl_image(pipeline)
%time new_clip.write_videofile(foutput2, audio=False)
HTML("""
<video width="1000" height="800" controls>
<source src="{0}" type="video/mp4">
</video>
""".format('https://github.com/wzding/Electric_Eel_Capstone/blob/master/video/loop_with_traffic_light_detection.mp4?raw=true'))
A few resources I have used in this project are:
classification:
Tutorial:
https://pythonprogramming.net/creating-tfrecord-files-tensorflow-object-detection-api-tutorial/